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Regular  Paper 

Programmable  Architectures  and  Design  Methods  for 
Two- Variable  Numeric  Function  Generators*1 

Shinobu  Nagayama,'1'1  Tsutomu  Sasao12 
and  Jon  T.  Butler1'3 

This  paper  proposes  programmable  architectures  and  design  methods  for  nu¬ 
meric  function  generators  (NFGs)  of  two-variable  functions.  To  realize  a  two- 
variable  function  in  hardware,  we  partition  a  given  domain  of  the  function  into 
segments,  and  approximate  the  function  by  a  polynomial  in  each  segment.  This 
paper  introduces  two  planar  segmentation  algorithms  that  efficiently  partition  a 
domain  of  a  two-variable  function.  This  paper  also  introduces  a  design  method 
for  symmetric  two-variable  functions  (i.e.  f(X,Y)  =  f(Y,X)).  This  method 
can  reduce  the  memory  size  needed  for  symmetric  functions  by  nearly  half  with 
small  speed  penalty.  The  proposed  architectures  allow  a  systematic  design  of 
various  two-variable  functions.  We  compare  our  approach  with  one  based  on  a 
one- variable  NFG.  FPGA  implementation  results  show  that,  for  a  complicated 
function,  our  NFG  achieves  57%  of  memory  size  and  60%  of  delay  time  of  a 
circuit  designed  based  on  a  one- variable  NFG. 

1.  Introduction 

The  ability  to  compute  numeric  functions  at  a  high  speed  is  important  in 
many  applications  12\  including  3D  computer  graphics,  hardware  accelerators 
for  technical  computing  packages,  direct  digital  frequency  synthesizers4^,  and 
digital  signal  processing.  Various  design  methods  for  numeric  function  generators 
(NFGs)  have  been  devised  for  numeric  functions  on  one  variable  5)d°),14),15),18)-20)# 
Only  a  few  methods  exist  for  multi- variable  functions  (e.g.,  yJX2  +  Y2  +  Z2  and 
arctan(X/V))  6),7),22).  However,  these  methods  are  function-specific;  different 
functions  require  different  methods.  As  far  as  we  know,  no  systematic  design 
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method  exists  for  generic  multi- variable  functions. 

A  straightforward  design  method  for  arbitrary  multi- variable  function  is  to  use 
a  single  memory  in  which  the  address  is  a  combination  of  values  of  variables  and 
the  content  of  that  address  is  the  corresponding  value  of  function.  This  method 
produces  a  fast  implementation,  but  requires  a  2mn-word  memory  to  implement 
an  m-variable  function  with  n  bits  for  each  variable.  Thus,  unlike  one-variable 
functions,  even  for  a  computation  with  a  small  number  of  bits,  this  method  is 
impractical  because  of  large  memory  size. 

To  produce  a  practical  implementation,  multi- variable  functions  are  often  de¬ 
signed  in  a  conventional  (trivial)  manner  that  uses  a  combination  of  one- variable 
function  generators,  multipliers,  and  adders 6),7).  For  example,  the  function 
V X2  -j-  Y2  +  Z2  can  be  realized  using  three  circuits,  each  realizing  a2,  two  adders, 
and  a  square  root  circuit.  This  design  method  may  require  small  memory  size. 
However,  depending  on  the  function  implemented,  it  can  produce  a  slow  imple¬ 
mentation  because  of  long  path  delays.  Also,  such  circuits  make  error  analysis 
harder.  That  is,  guaranteeing  output  accuracy  becomes  harder.  Also,  there  are 
many  multi- variable  functions  that  cannot  be  decomposed  into  one- variable  func¬ 
tions,  such  as  probability  distributions  that  are  functions  of  the  random  variable 
and  a  parameter,  like  variance. 

This  paper  proposes  a  systematic  design  method  for  two-variable  functions. 
Since  our  design  method  is  based  on  a  piecewise  polynomial  approximation,  ar¬ 
chitectures  are  simple  even  for  complicated  functions.  To  approximate  a  given 
function  using  piecewise  polynomials,  we  introduce  two  planar  segmentation  al¬ 
gorithms  that  efficiently  partition  a  given  domain  of  a  two-variable  function. 
We  also  introduce  programmable  architectures  that  can  realize  a  wide  range  of 
two- variable  functions. 

The  rest  of  this  paper  is  organized  as  follows:  Section  2  introduces  the  number 
representation  and  the  decision  diagrams  used  in  this  paper.  Section  3  presents 
two  planar  segmentation  algorithms  and  a  polynomial  approximation  method 
using  bilinear  interpolation.  Section  4  presents  programmable  architectures  for 
two-variable  functions.  Section  5  presents  an  architecture  and  a  design  method 
for  symmetric  two-variable  functions.  Section  6  evaluates  performance  of  our 
segmentation  algorithms  and  architectures  for  two-variable  functions.  And,  Sec- 
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tion  7  concludes  the  paper.  An  error  analysis  for  our  NFGs  is  omitted  because 
it  is  the  almost  same  as  Refs.  14),  19). 

2.  Preliminaries 

2.1  Number  Representation  and  Errors 

Definition  1  A  numeric  function  generator  (NFG)  is  a  logic  circuit  that 
computes  approximated  values  for  a  numeric  (real)  function  within  some  given 
acceptable  error  e.  A  one-variable  NFG  is  a  logic  circuit  for  a  one-variable 
numeric  function  f(X),  whose  input  is  X,  and  output  is  an  approximated  value 
for  f(X).  A  two-variable  NFG  is  a  logic  circuit  for  a  two-variable  numeric 
function  f(X,Y),  whose  inputs  are  X  and  Y,  and  output  is  an  approximated 
value  for  f(X,  Y). 

Definition  2  A  value  X  represented  by  the  binary  fixed-point  represen¬ 
tation  is  denoted  by 

X  =  (xi- 1  X{—2  ...  Xi  x0.  X—i  X—2  ■  ■  ■  X-m), 
where  Xi  E  {0,1},/  is  the  number  of  bits  in  the  integer  part ,  and  m  is  the  number 
of  bits  in  the  fractional  part.  Each  bit  Xi  contributes  2 lXi  to  the  value  of  X, 
except  xi-i,  which  contributes  —2l~1xi-i.  That  is,  the  fixed-point  representation 
is  in  two’s  complement. 

Definition  3  Error  is  the  absolute  difference  between  the  exact  value  and 
the  value  produced  by  the  hardware.  Acceptable  error  is  the  maximum  error 
that  an  NFG  may  assume;  it  is  usually  a  specification  to  be  satisfied  by  the  hard¬ 
ware.  Approximation  error  is  the  error  caused  by  a  function  approximation. 
Acceptable  approximation  error  is  the  maximum  approximation  error  that  a 
function  approximation  may  assume.  Rounding  error  is  the  error  caused  by  a 
binary  fixed-point  representation. 

Definition  4  Accuracy  is  the  number  of  bits  in  the  fractional  part  of  a  binary 
fixed-point  representation,  m-bit  accuracy  specifies  that  m  bits  are  used  to 
represent  the  fractional  part  of  the  number.  When  the  maximum  error  is  2~rn , 
the  accuracy  is  no  greater  than  1  unit  in  the  last  place  (ULP)12\  In  this 
paper,  an  m-bit  accuracy  NFG  is  an  NFG  with  an  m-bit  fractional  part  of  the 
inputs,  an  m-bit  fractional  part  of  the  output,  and  a  1  ULP  error. 


2.2  Decision  Diagrams 

The  proposed  design  uses  binary  decision  diagrams. 

Definition  5  A  binary  decision  diagram  (BDD)2hW  is  a  rooted  directed 
acyclic  graph  (DAG)  representing  a  logic  function.  The  BDD  is  obtained  by 
recursively  applying  the  Shannon  expansion  f  =  xifo  +Xifi  to  the  logic  function, 
where  f ,  fo,  and  fi  are  represented  by  nodes.  There  are  two  types  of  nodes, 
terminal  nodes  that  are  labeled  by  the  two  function  values,  0  and  1,  and  non¬ 
terminal  nodes  that  are  labeled  by  variable  names.  Each  non-terminal  node  has 
two  unweighted  outgoing  edges  labeled  0  and  1,  corresponding  to  the  value  of  the 
node’s  variable.  The  terminal  nodes  have  no  outgoing  edges.  We  consider  only 
ordered  BDDs,  where  the  order  of  the  variables  is  the  same  for  every  path  from  the 
root  node  to  a  terminal  node.  We  consider  only  reduced  BDDs,  where  identical 
subtrees  are  combined  into  a  single  tree. 

Definition  6  A  multi-terminal  BDD  (MTBDD)3^  is  an  extension  of  a 
BDD,  that  represents  an  integer- valued  function:  {0,  l}n  — >  S  C  Z ,  where  S  is 
a  finite  subset  of  the  set  Z  of  integers.  In  an  MTBDD,  the  terminal  nodes  are 
labeled  by  values  of  S. 

Definition  7  An  edge-valued  BDD  (EVBDD)8^  9)  is  also  an  extension 
of  a  BDD,  that  represents  an  integer- valued  function.  The  EVBDD  is  obtained 
by  repeatedly  applying  the  expansion  f  =  Xifo  +  Xi(f[  +  a)  to  the  integer- valued 
function,  where  fi  =  f[  +  a,  and  a  is  the  constant  term  of  f\.  In  an  EVBDD, 
all  1-edges  have  an  integer  weight  and  all  0-edges  have  weight  0.  There  is  only 
one  terminal  node ;  it  is  labeled  0.  The  incoming  edge  into  the  root  node  can  have 
a  non-zero  weight.  A  non-zero  weight  a  on  the  incoming  edge  of  the  root  node 
adds  a  to  all  sums  associated  with  all  paths  from  the  root  to  the  terminal  node 
of  the  EVBDD.  Indeed,  it  occurs  when  the  EVBDD  is  a  sub-EVBDD  to  a  larger 
EVBDD. 

Example  1  Figure  1  (b)  and  (c)  show  an  MTBDD  and  an  EVBDD  for  the 
integer- valued  function  f  defined  by  Fig.  1  (a).  In  Fig.  1  (b)  and  (c),  dashed  lines 
and  solid  lines  denote  0-edges  and  1-edges,  respectively.  Note  the  non-zero  weights 
on  1- edges  of  the  EVBDD.  In  the  MTBDD,  terminal  nodes  represent  function 
values.  Thus,  to  evaluate  the  function,  we  traverse  the  MTBDD  from  the  root 
node  to  a  terminal  node  according  to  the  input  values,  and  obtain  the  function 


IPSJ  Transactions  on  System  LSI  Design  Methodology  Vol.  3  118-129  (Feb.  2010) 


©  2010  Information  Processing  Society  of  Japan 


120  Programmable  Architectures  and  Design  Methods  for  Two-Variable  NFGs 


Xl 

yi 

XQ 

yo 

/ 

Xl 

y  i 

XQ 

yo 

/ 

0 

0 

0 

0 

0 

1 

0 

0 

0 

2 

0 

0 

0 

l 

0 

1 

0 

0 

l 

2 

0 

0 

1 

0 

0 

1 

0 

1 

0 

2 

0 

0 

1 

1 

0 

1 

0 

1 

1 

2 

0 

l 

0 

0 

1 

1 

l 

0 

0 

3 

0 

l 

0 

1 

1 

1 

l 

0 

1 

4 

0 

l 

1 

0 

1 

1 

l 

1 

0 

5 

0 

l 

1 

1 

1 

1 

l 

1 

1 

6 

(a)  Function  table. 


Fig.  1  MTBDD  and  EVBDD  for  an  integer-valued  function. 


value  (an  integer)  from  the  terminal  node.  On  the  other  hand ,  in  the  EVBDD , 
we  obtain  the  function  value  by  summing  the  weights  of  the  edges  traversed  from 
the  root  node  to  the  terminal  node.  (End  of  Example) 

3.  Piecewise  Polynomial  Approximation  Based  on  Planar  Segmen¬ 
tation 

3.1  Planar  Segmentation  Problem 

To  approximate  a  given  two- variable  function  by  piecewise  polynomials,  we 
partition  a  given  domain  of  the  function  into  segments,  and  approximate  the 
function  by  a  polynomial  in  each  segment.  By  narrowing  segments,  and  thus  in¬ 
creasing  the  number  of  segments,  we  can  decrease  the  approximation  error  to  the 
desired  value.  In  this  case,  the  memory  size  and  speed  of  an  NFG  are  strongly  de¬ 
pendent  on  segmentation  of  domain.  Thus,  to  design  fast  and  compact  NFGs,  we 
need  to  solve  the  following  segmentation  problem:  Given  a  two- variable  function, 
its  domain,  and  acceptable  approximation  error,  find  an  optimum  segmentation. 
To  find  an  optimum  segmentation,  we  consider  the  following: 

( 1 )  number  of  words  in  the  coefficients  memory ,  which  is  the  number  of  seg¬ 
ments,  and 

( 2 )  complexity  of  hardware  to  realize  segmentation,  called  the  segment  index 
encoder ,  which  maps  values  of  X  and  Y  to  a  segment  number. 
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Fewer  segments  are  preferred  because  the  number  of  segments  directly  affects 
the  size  of  the  coefficients  memory  of  the  NFG.  But,  the  complexity  of  the  segment 
index  encoder  is  important  as  well.  Even  if  the  number  of  segments  is  minimum, 
a  large  NFG  is  produced  if  the  segment  index  encoder  is  very  large. 

For  one- variable  functions,  since  the  domain  is  formed  in  one-dimension  (line), 
any  segmentation  can  be  realized  compactly.  Thus,  we  considered  only  the  num¬ 
ber  of  segments  to  find  an  optimum  segmentation 14),19).  On  the  other  hand,  for 
two- variable  functions,  since  the  domain  is  formed  in  two-dimensions  (plane), 
the  segment  index  encoders  tend  to  be  much  more  complex  than  for  one- variable 
functions.  Thus,  to  find  the  optimum  design  of  two- variable  NFGs,  it  is  necessary 
to  carefully  consider  the  complexity  of  the  segment  index  encoder. 

For  one-variable  functions,  we  have  proposed  linear  segmentation  algo¬ 
rithms  14),19)  to  find  an  optimum  segmentation  of  a  linear  domain  (an  approxima¬ 
tion  with  the  fewest  segments)  efficiently.  However,  for  two- variable  functions,  a 
planar  segmentation  algorithm  is  now  required  to  find  an  optimum  segmentation 
of  a  planar  domain.  In  planar  segmentations,  we  have  a  higher  degree  of  freedom, 
and  thus,  finding  an  optimum  segmentation  becomes  much  more  difficult  than 
in  linear  segmentation.  Because  many  segments  may  be  involved  in  a  practical 
design,  the  time  needed  to  find  an  optimum  segmentation  can  be  very  long.  To 
produce  an  efficient  planar  segmentation  in  a  short  computation  time,  we  fo¬ 
cus  on  heuristic  planar  segmentation  algorithms.  The  next  subsection  presents 
two  heuristic  planar  segmentation  algorithms  that  produce  an  efficient  planar 
segmentation  by  regularly  partitioning  a  given  domain  using  squares. 

3.2  Planar  Segmentation  Algorithms 

We  first  present  a  recursive  planar  segmentation  algorithm  to  reduce  the  hard¬ 
ware  complexity  of  both  the  coefficients  memory  (the  number  of  segments)  and 
the  segment  index  encoder.  Figure  2  shows  this  algorithm.  Inputs  of  the  algo¬ 
rithm  are  a  numeric  function  f(X,Y),  a  domain  {[X&,  Xe),  [Y^,  Ye)}  for  X  and 
Y,  an  accuracy  ra*n  of  X  and  Y,  and  an  acceptable  approximation  error  ea. 
This  algorithm  begins  by  computing  an  approximate  polynomial  g(X,Y).  This 
is  an  initial  approximation.  If  that  approximation  error  6  is  larger  than  the  given 
acceptable  error  ea,  then  the  domain  is  partitioned  into  four  equal-sized  square 
segments.  For  each  segment,  an  approximate  polynomial  is  computed  again.  The 
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Fig.  2  Recursive  planar  segmentation  algorithm. 


same  process  is  recursively  repeated  until  all  segments  have  approximation  errors 
smaller  than  ea.  Note  that  this  algorithm  creates  a  segment  of  size  WiXWi ,  where 
Wi  =  2hi  x  2~min  and  hi  is  an  integer.  That  is,  all  the  segmentation  points  Pi 
and  Qi  are  restricted  to  values  such  that  the  least  significant  hi  bits  are  0  (i.e. , 
Pi  =  (. . .  p~j+ 1  P-j  00  ...  0),  where  j  =  rriin  —  hi).  This  restriction  contributes 
to  reduce  the  complexity  of  the  segment  index  encoder. 

Next,  we  present  the  planar  uniform  segmentation  algorithm.  Since  the  re¬ 
cursive  planar  segmentation  algorithm  produces  non-uniform  segmentation,  a 
segment  index  encoder  is  needed  to  compute  a  segment  number  from  values  of 
X  and  Y.  However,  in  a  uniform  segmentation  where  the  number  of  segments  is 
a  power  of  2,  a  segment  index  encoder  is  not  necessary  because  a  segment  num¬ 
ber  is  obtained  by  the  most  significant  bits  of  X  and  Y  (see  Fig.  3(b)).  This 
eliminates  the  delay  of  the  segment  index  encoder,  and  produces  fast  NFGs.  To 
produce  a  uniform  segmentation,  we  begin  by  finding  the  smallest  square  seg¬ 
ment  needed  to  achieve  the  acceptable  approximation  error  using  the  recursive 
segmentation  algorithm  shown  in  Fig.  2.  Then,  we  partition  a  given  domain  into 
square  segments  all  with  the  same  size  as  the  smallest  segment. 

3.3  Approximation  Using  Bilinear  Interpolation  Polynomials 

For  g(X,Y)  in  Fig.  2,  we  can  use  any  approximating  polynomial.  In  general, 
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higher-order  polynomials  require  fewer  segments.  However,  for  multi- variable 
functions,  using  higher-order  polynomials  is  not  always  effective  in  reducing  the 
memory  size  of  NFGs.  This  is  because,  for  multi-variable  polynomials,  higher 
polynomial  order  requires  many  more  polynomial  coefficients.  Also,  higher-order 
polynomials  produce  slower  NFGs.  Thus,  for  polynomial  approximation  meth¬ 
ods,  reducing  memory  size  with  a  small  speed  penalty  is  a  key  issue.  To  accom¬ 
plish  this,  we  use  the  bilinear  interpolation  polynomials  21h 
Bilinear  interpolation  is  an  extension  of  linear  interpolation.  It  interpo¬ 
lates  two-variable  functions  /(X,  Y)  using  four  points.  In  Fig.  2,  to  interpo¬ 
late  /(X,  Y)  in  each  segment  {[BX,EX),  [ By,Ey )},  we  use  four  corner  points  of 
the  segment:  (Bx,By),  ( Bx,Ey ),  (. Ex,By ),  and  (Ex,Ey).  Let  fbb  =  f(Bx,By), 
fbe  =  f(Bx,Ey),  feb  =  f(Ex,By),  and  fee  =  f(Ex,  Ey).  Then,  the  bilinear 
interpolation  g(X,  Y)  is  given  by: 

(YYs  fbb  X  {Ex  ~  x)  x  {Ey  -Y)  +  feb  x  (V  -  Bx)  x  (. Ey  -  Y) 

9[  ’  {Ex-  Bx)  x  (j Ey  -  By) 

fbe  X  {Ex  -  X)  x  (Y  -  By)  +  fee  x  (X  -  Bx)  x  (Y  -  By) 

(■ Ex  -  BX)  X  ( Ey  -  By) 

By  expanding  and  rearranging  this,  we  obtain  the  following  form: 

S(X,  Y)  =  CxyXY  +  CxX  +  CyT  +  Co, 

where 

^  _  fbb  feb  fbe  T  fee 

XV~  {Ex  -  BX){Ey  -  By)’ 

y^i  _  fbbEy  +  feb^y  +  fjjeBy  feeBy 

X~  (Ex  -  BX)(Ey  -  By)  ’ 

^  _  fbbEx  +  febEx  +  fbeEx  feeBx 

V  ~  (EX  ~  BX)(Ey  -  By)  ’  “ 

_  fbbEXEy  f e\)BXEy  f \eE XB  y  +  f qqB  XB  y 

0_  Wx  -  BX){Ey  -  By)  • 

To  reduce  the  approximation  error,  the  maximum  positive  error  maxfg  and  the 
maximum  negative  error  minfg  are  equalized  by  a  vertical  shift  of  g(X,Y)  with 
a  correction  value  v  =  ( maxfg  +  minfg)/ 2.  Thus,  the  approximation  error  is 
(■ maxfg  —  minfg)/ 2,  and  the  approximating  polynomial  is  g(X,Y)  +  v. 

For  each  segment  {[Bx,  Ex),  [By,  Ey)j,  since  Bx  <  X  <  Ex  and  By  <  Y  < 
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Ey  hold,  we  can  offset  X  and  Y  by  Bx  and  By  to  compute  the  approximating 
polynomial  g(X,  Y)-\-v.  By  using  the  offset  inputs  (X  —  Bx)  and  (Y  —  By)  instead 
of  X  and  F,  we  reduce  the  size  of  multipliers  needed  to  compute  g(X,Y)  4  v. 
By  substituting  X  —  Bx  4  Bx  and  Y  —  By  4  By  for  X  and  Y  respectively,  we 
transform  g(X,  Y)  4  v  as  follows: 

g(X,Y)  +v  =  Cxy(X  -BXY  BX)(Y  -  By  4  By)  4  Cx(X  -  Bx  4  Bx) 

-\~Cy(Y  —  By  +  By)  +  C()  +  V 
=  Cxy(X  -  BX)(Y  -  By)  4  (Cx  4  CxyBy)(X  -  Bx) 

+  {Cy  +  Cxi/Bx  ){Y-  By)  4  CxyBxBy  +  C^  +  C^  +  Co  +  v 
=  -  Bx)(y  -  By)  +  C'x(X  -  Bx )  +  C'y(Y  -  By)  +  C'0,  (1) 

where  Cx  Cx  4  C xyBy ,  Cy  Cy  4  CxyBx ,  and  Uq  CxyBxBy  4  CXBX  4 
CyBy  4  Co  4  v. 

4.  Programmable  Architectures  for  Two- Variable  NFGs 

4.1  Architectures  Based  on  Recursive  and  Uniform  Segmentations 

Figure  3  shows  two  architectures  for  two- variable  NFGs  realizing  (1).  Fig¬ 
ure  3  (a)  and  (b)  show  architectures  based  on  recursive  segmentation  and  uniform 
segmentation,  respectively.  The  segment  index  encoder  converts  values  of  X  and 
Y  into  a  segment  number.  This,  in  turn,  is  applied  as  the  address  input  of  the  co¬ 
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(a)  Architecture  based  on  recursive  seg-  (b)  Architecture  based  on  uniform  seg¬ 
mentation.  mentation. 

Fig.  3  Architectures  for  two-variable  NFGs  using  bilinear  interpolation. 
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efficients  memory.  The  coefficients  are  applied  to  adders  and  multipliers  to  form 
the  polynomial  value  g(X,  Y)  +v.  Note  that  Fig.  3  (a)  uses  bitwise  AND  gates  to 
compute  X  —  Bx  and  Y  —  By.  In  recursive  segmentation,  we  can  realize  X  —  Bx 
and  Y  —  By  using  AND  gates  driven  on  one  side  by  Bx  and  By ,  respectively  15h 
Note  that  Fig.  3  (b)  has  neither  a  segment  index  encoder  nor  bitwise  AND 
gates.  In  uniform  segmentation,  the  segment  index  encoder  and  bitwise  AND 
gates  are  not  necessary.  This  is  because  a  segment  number  is  obtained  by  the 
most  significant  bits  of  X  and  F,  and  X  —  Bx  and  Y  —  By,  which  are  realized 
with  bitwise  AND  gates  in  Fig.  3  (a),  are  obtained  by  the  least  significant  bits. 
4.2  Architecture  and  Design  Method  for  Segment  Index  Encoder 
The  segment  index  encoder  realizes  the  segment  index  function:  {0,  l}n  x 
{0,  l}n  — >  {0, 1, . . . ,  k  —  1}  shown  in  Fig.  4  (a),  where  X  and  Y  have  n  bits,  and 
k  denotes  the  number  of  segments.  We  realize  this  function  with  the  architecture 
shown  in  Fig.  4(b).  In  this  architecture,  the  values  of  interconnecting  lines  be¬ 
tween  adjacent  LUT  memories  represent  sub-functions  in  the  EVBDD  (labeled 
rails ),  and  the  outputs  from  each  LUT  memory  to  the  adders  tally  the  function 
value  (labeled  Arails).  Consider  the  design  of  the  LUT  cascade  and  adders  in 
Fig.  4  (b),  given  the  segmentation  produced  in  Fig.  2. 

We  begin  by  representing  the  segment  index  function  using  an  MTBDD.  Fig¬ 
ure  5  illustrates  the  relationship  between  recursive  segmentation  and  MTBDDs. 


Segments 

Index 

Xb<X  <P0 

yb<y  <Qo 

0 

Xb  <  X  <  P0 

Qo  <  X  <  Qi 

1 

Pr- 1  <  X  <  Ye 
Qi — i  <  X  <  Ye 

k-  1 

(a)  Segment  index  function. 
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> 

(b)  LUT  cascade  and  adders  15) . 


Fig.  4  Segment  index  encoder. 
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1 

3 

0 

2 

(c)  Seven  segments. 


(a)  Two  segments.  (b)  Four  segments. 

Fig.  5  Relationship  between  recursive  segmentation  and  MTBDDs. 


Fig.  6  Decomposition  of  the  EVBDD. 


Then,  we  convert  the  MTBDD  into  an  EVBDD.  By  decomposing  the  EVBDD, 
as  shown  in  Fig.  6,  we  obtain  the  architecture  in  Fig.  4  (b).  In  Fig.  6,  the  column 
labeled  as  in  the  table  of  each  LUT  memory  denotes  the  rails  that  represent 
sub-functions  in  the  EVBDD.  And,  the  column  Laf  in  Fig.  6  denotes  the  Arails 
that  represent  the  sum  of  weights  of  edges.  In  the  EVBDD,  “(cq,?r)”  assigned 
to  edges  that  cut  across  the  horizontal  lines  represents  the  sum  of  weights  and 
sub- functions,  respectively.  For  more  detail  on  this  architecture,  see  15). 

In  this  architecture,  the  size  of  LUT  memories  realizing  the  recursive  segmen¬ 
tation  depends  on  the  number  of  segments.  Specifically, 

Theorem  1  Let  seg_func(X,Y )  be  a  segment  index  function  obtained  by  a 
recursive  planar  segmentation.  The  segment  index  function  can  be  realized  by  the 
segment  index  encoder  shown  in  Fig.  4  (b)  with  at  most  [log2  k\  rails  and  at  most 
[log2  k]  Arails ,  where  k  is  the  number  of  segments. 
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Proof:  See  Appendix. 

The  segment  index  encoder  satisfying  Theorem  1  is  obtained  when  the  variable 
order  for  the  EVBDD  is  xi-i,yi-i,xi-2,yi-2i  •  •  • , x_m,  y-m  from  the  top  to  the 
bottom  (see  the  proof  in  Appendix).  We  can  also  use  the  optimization  techniques 
for  multi-valued  decision  diagrams  presented  in  13)  to  optimize  the  variable  order 
for  EVBDD. 

In  our  architectures,  the  coefficients  memory  and  the  LUT  memories  of  the 
segment  index  encoder  are  implemented  by  RAMs.  Thus,  by  changing  the  data 
for  the  coefficients  memory  and  the  LUT  memories,  a  wide  class  of  two- variable 
functions  can  be  realized  by  a  single  architecture. 

5.  Design  Method  for  Symmetric  Functions 

Definition  8  A  two-variable  function  f(X,Y)  is  symmetric  if  f(X,Y)  = 
f(Y,X). 

Symmetric  functions  are  commonly  found  in  practical  applications  of  NFGs.  For 
example,  \fX2  +  V2,  which  is  used  in  converting  from  rectangular  to  polar  coor¬ 
dinates,  is  symmetric.  This  section  presents  an  architecture  and  a  design  method 
taking  advantage  of  the  function’s  symmetry. 

Definition  9  A  segmentation  is  symmetric  if  for  every  segment  {[Bxi,  Exi), 
[Byi,  Eyi)}  such  that  Bx\  ^  By i  or  Ex\  ^  Eyi,  there  is  another  segment 
{ [Bx2 1  Ex2) 5  \By2-,  Ey2^  such  that  Bx\  =  By2 ,  Ex i  =  Ey2 ,  -^yi  =  BX2,  and 
Eyi  =  Ex2 .  Symmetric  segments  are  a  pair  of  such  segments. 

Lemma  1  Let  /(V,  Y)  be  a  symmetric  function ,  and  let  gi(X,Y)  and 
g2(X,Y)  be  bilinear  interpolations  of  f(X,Y)  for  symmetric  segments.  Then, 
gi(X,Y)=g2(Y,X). 

Proof:  See  Appendix. 

Theorem  2  The  segmentation  of  a  symmetric  function  produced  by  the  re¬ 
cursive  planar  segmentation  algorithm  is  symmetric. 

Proof:  See  Appendix. 

From  Lemma  1  and  Theorem  2,  we  can  use  only  one  bilinear  interpolation  to 
approximate  the  given  symmetric  function  in  symmetric  segments.  By  assigning 
the  same  segment  index  to  symmetric  segments,  we  can  reduce  the  size  of  the 
coefficients  memory  by  nearly  half. 
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Table  1  Number  of  segments  for  two  segmentation  methods. 


No. 

Function 

f(X,Y) 

Domain 

X  and  Y  have  8-bit  accuracy 
(Acceptable  approx,  error:  2-10) 

X  and  Y  have  12-bit  accuracy 
(Acceptable  approx,  error:  2-14 

) 

X 

Y 

Number  of  segments 

Rsi 

[%i 

Rs2 

[%] 

Time  [sec.] 

Number  of  segments 

Rsi 

[%i 

Rs2 

[%] 

Time  [sec.] 

Uni. 

Recur. 

Sym. 

Uni. 

Recur. 

Uni. 

Recur. 

Sym. 

Uni. 

Recur. 

fo 

sin(7 tX)VY 

[0,1) 

(0,1) 

16,384 

997 

N/A 

6 

N/A 

0.69 

0.06 

16,773,120 

29,875 

N/A 

0.2 

N/A 

9.26 

1.97 

fl 

sin(7r  XY) 

[0,1) 

[0,1) 

1,024 

508 

263 

50 

26 

0.07 

0.03 

16,384 

8,389 

4,232 

51 

26 

1.00 

0.42 

/2 

X4V5 

[0,1) 

[0,1) 

4,096 

193 

N/A 

5 

N/A 

0.30 

0.02 

65,536 

3,592 

N/A 

5 

N/A 

4.73 

0.33 

h 

1/VX2  + Y2 

(0,1) 

(0,1) 

65,025 

2,344 

1,195 

4 

2 

0.02 

0.07 

16,769,025 

103,046 

51,687 

0.6 

0.3 

6.10 

3.91 

h 

XY/VX 2  +  Y2 

(0,1) 

(0,1) 

4,096 

256 

139 

6 

3 

0.12 

0.01 

1,048,576 

4,114 

2,104 

0.4 

0.2 

28.05 

0.16 

h 

WaveRings 

(0,4 

[0,4 

10,201 

949 

490 

9 

5 

0.85 

0.04 

646,416 

16,278 

8,202 

3 

1 

24.51 

0.76 

fa 

Sombrero 

(0,8) 

(0,8) 

4,096 

1,180 

607 

29 

15 

0.21 

0.06 

65,536 

18,664 

9,398 

28 

14 

3.40 

0.93 

h 

VX2  +  Y2 

(0,1) 

(0,1) 

4,096 

226 

121 

6 

3 

0.17 

0.01 

1,048,576 

4,093 

2,083 

0.4 

0.2 

40.58 

0.22 

/s 

tyx5  +  Y3 

(0,1) 

(0,1) 

4,096 

232 

127 

6 

3 

0.33 

0.02 

1,048,576 

3,955 

2,027 

0.4 

0.2 

78.21 

0.41 

Uni.:  Uniform  segmentation.  Recur.:  Recursive  segmentation.  Rs (No.  of  segments  in  Recur.)  /  (No.  of  segments  in  Uni.)  x  100  (%). 

Sym.:  Symmetric  segments  are  counted  as  one  segment.  Rs 2’  (No.  of  segments  in  Sym.)  /  (No.  of  segments  in  Uni.  )  X  100  (%). 

Experiment  environment:  Sub  Blade  2500  (Silver),  UltraSPARC-IIIi  1.6  GHz,  6  GB  memory,  Solaris  9. 
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Fig.  7  Architecture  for  two-variable  NFGs  for  symmetric  functions. 


Figure  7  shows  an  architecture  for  symmetric  functions.  Here,  the  coefficients 
memory  stores  only  data  for  segments  such  that  X  <  Y.  For  other  segments, 
approximated  values  are  computed  using  Lemma  1.  Since  the  comparator  and 
multiplexers  operate  in  parallel  with  the  segment  index  encoder,  there  is  no  speed 
penalty  due  to  these  additional  circuits. 


6.  Experimental  Results 


6.1  Number  of  Segments  and  Computation  Time  for  Algorithms 
Table  1  shows  the  number  of  segments  produced  by  the  two  segmentation 
algorithms  presented  in  Section  3,  and  their  computation  time  for  various  func¬ 
tions  d .  This  table  also  shows  the  number  of  symmetric  segments  for  symmetric 
functions.  In  this  table,  WaveRings  and  Sombrero  are 


WaveRings  = 


cos  (VX2  -f  F2) 
VX2  +  Y2  +“R25 


Sombrero  = 


sin  (VX2  +  Y2) 
VX2  +  F2=_~ 


Table  1  shows  that,  for  all  functions  except  sin(7rXF)  and  Sombrero ,  the  re¬ 
cursive  segmentation  algorithm  produces  many  fewer  segments  than  the  uniform 
segmentation  algorithm.  Especially,  for  higher  accuracy,  the  number  of  segments 
needed  in  recursive  segmentation  is  only  a  few  percent  of  the  number  of  seg¬ 
ments  needed  in  uniform  segmentation.  And,  the  number  of  symmetric  segments 
is  even  smaller.  Thus,  the  recursive  segmentation  algorithm  and  the  symmetric 
technique  significantly  reduce  the  number  of  words  in  the  coefficients  memory. 
For  sin(7rXT)  and  Sombrero ,  the  additional  segments  needed  in  uniform  seg¬ 
mentation  are  not  so  large  even  for  higher  accuracy.  This  means  that,  for  these 
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Table  2  Total  memory  sizes  needed  for  the  proposed  NFGs. 


No. 

8-bit  accuracy  NFGs 

12-bit  accuracy  NFGs 

Uniform 

Recursive 

Sym. 

Rml 

Rm2 

Uniform 

Recursive 

Sym. 

Rml 

Rm2 

fo 

409,600 

57,732 

N/A 

14 

N/A 

201,277,440 

2,167,788 

N/A 

1 

N/A 

h 

37,888 

34,580 

19,417 

91 

51 

737,280 

701,311 

356,164 

95 

48 

h 

118,784 

13,817 

N/A 

12 

N/A 

2,621,440 

226,644 

N/A 

9 

N/A 

h 

1,040,400 

175,760 

97,236 

17 

9 

402,456,600 

9,412,758 

4,698,276 

2 

1 

h 

118,784 

16,064 

10,145 

14 

9 

34,603,008 

293,330 

153,176 

0.8 

0.4 

h 

397,839 

71,981 

39,986 

18 

10 

27,149,472 

1,559,560 

797,279 

6 

3 

h 

143,360 

74,896 

40,772 

52 

28 

2,818,048 

1,487,068 

757,238 

53 

27 

h 

118,784 

14,908 

9,334 

13 

8 

34,603,008 

287,868 

153,291 

0.8 

0.4 

/8 

135,168 

15,512 

9,658 

11 

7 

38,797,312 

294,328 

154,309 

0.8 

0.4 

Rml-  Recursive  /  Uniform  x  100  (%).  Rm2’-  Sym.  /  Uniform  x  100  (%). 


functions,  the  uniform  segmentation  method  also  produces  NFGs  with  reasonable 
size. 

In  addition,  Table  1  shows  that  both  algorithms  produce  segments  with  small 
CPU  time.  Such  quick  segmentation  is  useful  to  reduce  design  time  for  NFGs. 

6.2  Memory  Sizes  Needed  for  Numeric  Function  Generators 

Table  2  compares  total  memory  sizes  needed  for  the  three  NFGs  shown  in 
Fig.  3  and  Fig.  7.  Note  that  the  NFGs  based  on  recursive  segmentation  have  two 
kinds  of  memories:  coefficients  memory  and  LUT  memory.  The  memory  size 
shown  is  the  sum  of  the  coefficients  memory  size  and  the  LUT  memory  sizes. 

Table  2  shows  that,  for  all  functions,  NFGs  based  on  recursive  segmentation 
require  smaller  memory  size  than  NFGs  based  on  uniform  segmentation,  even 
though  NFGs  based  on  recursive  segmentation  have  a  segment  index  encoder. 
For  example,  for  f^{X^Y)  =  XY j\/X2  +  T2,  the  12-bit  accuracy  NFG  using 
recursive  segmentation  requires  only  0.8%  of  memory  required  by  uniform  seg¬ 
mentation.  Especially  for  symmetric  functions,  using  the  symmetric  technique 
shown  in  Section  5  reduces  the  memory  size  significantly. 

To  understand  the  relation  between  memory  size  and  accuracy,  we  designed 
NFGs  for  XY/VX 2  +  Y2  with  various  accuracies.  Figure  8  plots  memory  sizes 
of  the  NFGs  for  4  to  16-bit  accuracies.  There  are  four  curves: 

( 1 )  a  single  look-up  table  in  which  the  values  assigned  to  X  and  Y  form  an 
address  and  the  contents  of  that  address  is  /(X,  Y)r 
(  2  )  NFGs  using  uniform  segmentation, 
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Accuracy 

Fig.  8  Memory  size  versus  accuracy  for  XY/YX2  +  Y2. 

( 3 )  NFGs  using  recursive  non-uniform  segmentation,  and 

( 4 )  NFGs  using  the  symmetric  technique. 

Interestingly,  for  this  function,  the  memory  size  of  the  NFGs  using  uniform  seg¬ 
mentation  increases  in  the  same  way  as  the  memory  size  of  a  single  look-up  table. 
On  the  other  hand,  the  memory  sizes  of  the  NFGs  using  recursive  segmentation 
and  the  NFGs  using  symmetric  technique  increase  much  more  slowly  than  the 
other  two.  For  16-bit  accuracy,  the  memory  sizes  of  the  NFG  using  recursive 
segmentation  and  the  NFG  using  symmetric  technique  are  only  0.06%  and  0.03% 
of  that  of  the  NFG  using  uniform  segmentation,  respectively. 

6.3  FPGA  Implementation  Results 

To  show  the  merits  and  demerits  of  the  three  proposed  methods,  we  compare 
the  performance  of  the  NFGs  designed  by  the  three  methods.  We  implemented 
12-bit  accuracy  NFGs  using  the  Altera  Stratix  III  FPGA.  Since  the  FPGA  has 
adaptive  look-up  tables  (ALUTs)  that  can  realize  fast  adders,  synchronous  mem¬ 
ory  blocks,  and  dedicated  multipliers,  our  NFGs  are  efficiently  implemented  by 
those  hardware  resources  in  the  FPGA.  Table  3  compares  the  FPGA  implemen¬ 
tation  results  of  the  NFGs.  In  this  table,  the  three  columns  labeled  “Delay”  show 
the  total  delay  time  of  each  NFG  from  the  input  to  the  output,  in  nanoseconds. 

The  NFGs  based  on  uniform  segmentation  require  fewer  pipeline  stages  and 
have  shorter  delay  than  the  recursive  segmentation  because  they  have  no  segment 
index  encoder.  However,  for  six  functions,  the  memory  needed  for  NFGs  based 
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Table  3  FPGA  implementation  of  12-bit  accuracy  NFGs. 


FPGA  device:  Altera  Stratix  III  (EP3SL340F1517C2)  Logic  synthesis  tool:  Altera  QuartusII  9.0 


No. 

Uniform  segmentation 

Recursive  segmentation 

Symmetric  method 

#ALUTs 

#DSPs 

Freq. 

[MHz] 

^stages 

Delay 

[ns] 

#ALUTs 

#DSPs 

Freq. 

[MHz] 

^stages 

Delay 

[ns] 

#ALUTs 

#DSPs 

Freq. 

[MHz] 

^stages 

Delay 

[ns] 

fo 

- 

0 

- 

1 

- 

440 

6 

230 

15 

65 

N/A 

N/A 

N/A 

N/A 

N/A 

h 

49 

5 

203 

4 

20 

271 

10 

191 

9 

47 

297 

9 

191 

10 

52 

h 

206 

0 

306 

4 

13 

266 

8 

187 

10 

53 

N/A 

N/A 

N/A 

N/A 

N/A 

fs 

- 

0 

- 

1 

- 

- 

7 

- 

18 

- 

644 

7 

174 

16 

92 

U 

- 

0 

- 

4 

- 

220 

6 

228 

10 

44 

273 

6 

222 

11 

50 

h 

- 

3 

- 

4 

- 

477 

10 

221 

13 

59 

493 

10 

221 

13 

59 

fe 

153 

4 

230 

4 

17 

336 

8 

192 

11 

57 

293 

8 

191 

11 

57 

h 

- 

1 

- 

4 

- 

237 

6 

231 

10 

43 

279 

6 

231 

11 

48 

h 

- 

1 

- 

4 

- 

236 

8 

199 

10 

50 

255 

8 

199 

10 

50 

-:  NFGs  cannot  be  mapped  into  the  FPGA  due  to  insufficient  memory  blocks. 

#ALUTs:  Number  of  ALUTs.  #DSPs:  Number  of  18-bit  X  18-bit  DSP  units.  Freq.:  Operating  frequency.  ^stages:  Number  of  pipeline  stages. 


on  uniform  segmentation  is  so  large  that  they  could  not  be  implemented  on 
the  FPGA.  Note  that  NFGs  that  have  only  one  pipeline  stage  in  Table  3  are 
realized  with  a  single  look-up  table  due  to  the  excessively  many  segments.  On 
the  other  hand,  for  all  functions  except  for  /3(X,  T)  =  l/y/X2  +  Y2,  the  NFGs 
based  on  recursive  segmentation  do  not  require  excessive  memory  size  and  can 
be  implemented  on  the  FPGA.  Further,  the  successful  implementations  achieve 
a  high  operating  frequency.  Since  the  symmetric  technique  significantly  reduces 
memory  size,  even  function  /3  can  be  implemented  with  the  FPGA.  But,  the 
symmetric  technique  has  some  speed  penalty  because  it  produces  a  slightly  more 
complex  segment  index  encoder. 

In  this  way,  the  three  methods  have  different  merits  and  demerits,  and  thus,  we 
can  use  the  three  methods  appropriately  depending  on  applications  and  numeric 
functions. 

Although  the  recursive  segmentation  and  symmetric  technique  have  some  speed 
penalty  as  shown  in  Table  3,  the  penalty  is  reasonable.  To  show  that,  we  compare 
our  NFGs  with  an  NFG  designed  in  a  conventional  (trivial)  manner  that  uses  a 
combination  of  one-variable  NFGs  and  basic  operations  like  addition  and  mul¬ 
tiplication.  We  implemented  /4(X,  Y)  =  XY/y/X 2  +  Y2  with  the  same  FPGA 
using  a  one- variable  NFG  for  l/y/~X,  two  squaring  circuits,  an  adder,  and  two 
multipliers.  The  one- variable  NFG  was  realized  by  the  method  shown  in  15), 


Table  4  FPGA  implementation  of  various  NFGs  for  XY/YX2  +  Y2 . 


FPGA  device:  Altera  Stratix  III  (EP3SL340F1517C2) 

Logic  synthesis  tool:  Altera  QuartusII  9.0 


NFGs 

Memory 

[bits] 

#LEs 

#DSPs 

Freq. 

[MHz] 

#stages 

Delay 

[nsec.] 

12-bit  accuracy 

One- variable 

269,136 

266 

10 

192 

14 

73 

Uniform 

34,603,008 

- 

0 

- 

4 

- 

Recursive 

293,330 

220 

6 

228 

10 

44 

Symmetric 

153,176 

273 

6 

222 

11 

50 

which  is  based  on  linear  approximation  and  non-uniform  segmentation.  Table  4 
compares  the  results  with  our  NFGs. 

Our  NFG  based  on  recursive  segmentation  requires  fewer  ALUTs  and  DSPs 
than  the  implementation  using  one- variable  NFG,  and  has  much  shorter  delay. 
Especially,  the  NFG  designed  by  the  symmetric  method  achieves  both  less  mem¬ 
ory  and  shorter  delay.  This  shows  that  the  speed  penalties  caused  by  the  recursive 
segmentation  and  the  symmetric  method  are  small  enough. 

From  these  results,  we  can  see  that  by  designing  two- variable  functions  using 
one- variable  NFGs,  the  required  memory  size  can  be  reduced  significantly.  How¬ 
ever,  depending  on  the  functions,  it  can  produce  a  slow  implementation  because 
of  additional  logic,  such  as  multipliers.  Also,  complicated  architectures  using 
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one- variable  NFGs  make  error  analysis  harder,  and  it  is  harder  to  guarantee  out¬ 
put  accuracy.  This  increases  design  time.  On  the  other  hand,  for  a  large  class  of 
functions,  we  can  automatically  generate  fast  and  compact  NFGs  whose  output 
accuracy  is  guaranteed. 

7.  Concluding  Remarks 

We  have  proposed  programmable  architectures  and  design  methods  for  numeric 
function  generators  of  two- variable  functions.  To  realize  a  two- variable  function 
in  hardware,  we  partition  the  given  domain  of  the  function  into  segments,  and 
approximate  the  given  function  by  a  polynomial  in  each  segment.  In  this  paper, 
we  presented  two  planar  segmentation  algorithms  which  partition  a  given  do¬ 
main  of  two- variable  function  efficiently.  We  also  presented  a  design  method  for 
symmetric  two-variable  functions.  To  the  best  of  our  knowledge,  these  are  the 
first  systematic  design  methods  based  on  piecewise  polynomial  approximation 
for  two- variable  functions.  Experimental  results  showed  that  for  a  complicated 
function,  our  automatically  generated  NFG  achieves  higher  performance  than  the 
NFG  that  is  manually  designed  in  a  conventional  manner. 

In  the  proposed  architectures,  the  coefficients  memory  and  the  LUT  memories 
of  the  segment  index  encoder  can  be  implemented  by  embedded  RAMs  in  an 
FPGA  (e.g.,  M4Ks  in  Altera  FPGAs).  Thus,  by  changing  the  data  for  the  co¬ 
efficients  memory  and  the  LUT  memories,  a  wide  class  of  two- variable  functions 
can  be  realized  by  a  single  architecture.  Since  just  changing  the  RAM  data  can 
switch  functions,  we  can  switch  functions  without  reprogramming  the  FPGA. 

The  algorithms  and  architectures  presented  in  this  paper  can  be  easily  extended 
to  functions  with  three  or  more  variables. 
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Appendix 

This  appendix  shows  the  proofs  of  Theorem  1,  Lemma  1,  and  Theorem  2. 

The  proof  of  Theorem  1  is  based  on  a  theorem  proven  in  Ref.  15).  Specifically, 
it  was  shown  that 

Theorem  A  Let  g(Z )  be  a  k-valued  monotone  increasing  function.  The  func¬ 
tion  g(Z )  can  be  realized  by  the  segment  index  encoder  shown  in  Fig.  4  (b)  with 
at  most  [log 2  fc]  rails  and  [log 2  fc]  Arailslb\ 

Theorem  1  Let  seg.func(X,Y )  be  a  segment  index  function  obtained  by  a 
recursive  planar  segmentation.  The  segment  index  function  can  be  realized  by  the 
segment  index  encoder  shown  in  Fig.  4  (b)  with  at  most  [log2  k]  rails  and  at  most 
[log2  k]  Arails ,  where  k  is  the  number  of  segments. 

Proof:  By  forming  a  variable 
z  =  (xi- 1  yt-l  Xl-2  Vl- 2  •  •  •  z-m  y-m) 
from  X  and  Y ,  seg_func(X,  Y)  obtained  by  the  recursive  planar  segmentation 
algorithm  can  be  converted  into  a  A;- valued  monotone  increasing  function  g(Z). 
Therefore,  from  Theorem  A,  we  have  this  theorem.  I 

Lemma  1  Let  f(X,Y)  be  a  symmetric  function ,  and  let  gi(X,Y)  and 
g2(X,Y)  be  bilinear  interpolations  of  f(X,Y)  for  symmetric  segments.  Then, 
gi(X,Y)  =  g2(Y,X). 

Proof:  Let  gi{X,Y)  =  CxylXY  +  CxlX  +  CyiY  +  C01  and  g2(X,Y)  = 


Cxy2XY  +  Cx2X  +  Cy2Y  +  Co2-  From  the  definition  of  bilinear  interpolation,  in 
symmetric  segments,  the  following  hold:  Cxy i  =  Cxy2,  Cx\  =  Cy2 ,  Cy\  =  Cx2, 
and  Coi  =  Cq2.  Therefore,  we  have  the  lemma.  I 

To  prove  Theorem  2,  we  define  the  following: 

Definition  A  Diagonal  segment  {[Bx,  Ex),[By,  Ey)}  is  a  segment  such 
that  Bx  -  By  and  Ex  =  Ey. 

Theorem  2  The  segmentation  of  a  symmetric  function  produced  by  the  re¬ 
cursive  planar  segmentation  algorithm  is  symmetric. 

Proof:  Let  {[Bxl,Exl),  [Byl,Eyl)}  and  {[Bx2,Ex2),  [By2,Ey2)}  be  symmetric 
segments.  Since  (Bx  1  +  Exl)/2  =  ( By2  +  Ey 2)/2  and  (Byl  +  Eyl)/2  =  (. Bx2  + 
Ex 2)/2,  the  segmentation  of  the  symmetric  segments  into  four  equal-sized  square 
segments  is  symmetric.  The  segmentation  of  diagonal  segment  into  four  equal¬ 
sized  square  segments  is  also  symmetric. 

From  Lemma  1,  the  maximum  approximation  errors  caused  in  symmetric  seg¬ 
ments  are  equal.  Thus,  if  a  segment  is  partitioned,  then  another  segment  sym¬ 
metric  to  the  segment  is  also  partitioned. 

Therefore,  the  recursive  planar  segmentation  algorithm  produces  a  symmetric 
segmentation  for  a  symmetric  function.  I 
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